Asymmetric Gradient Boosting with Application to Spam Filtering

نویسندگان

  • Jingrui He
  • Bo Thiesson
چکیده

In this paper, we propose a new asymmetric boosting method, Boosting with Different Costs. Traditional boosting methods assume the same cost for misclassified instances from different classes, and in this way focus on good performance with respect to overall accuracy. Our method is more generic, and is designed to be more suitable for problems where the major concern is a low false positive (or negative) rate, such as spam filtering. Experimental results on a large scale email spam data set demonstrate the superiority of our method over state-of-the-art techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research on E-mail Filtering Based On Improved Bayesian

Naïve Bayesian has been widely used in spam filter because it simply and it also could classify texts more correctly and quickly. However, in the process of classifying and filtering, the traditional method doesn't consider the different features between the spam mail and the legitimate mail, and it also doesn't take into account the loss of misclassifying legitimate mail as spam, so there are ...

متن کامل

Ensemble of SVM Classifiers for Spam Filtering

Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and d...

متن کامل

Spam Source Clustering by Constructing Spammer Network with Correlation Measure

Spam filtering is one of the most challenging problems in electric message systems. In general, recent studies on specifying real spam source are based on content filtering because spammers usually falsify their origin. We propose a method to specify spam source based on structural analysis with complex network. We assume that each spam sources either has the same victim list or uses the same s...

متن کامل

A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain

In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ-text, Mutual Information and Document Frequency feature selection methods ...

متن کامل

BiBoost for Asymmetric Learning

Although boosting methods have become an extremely important classification method, there has been little attention paid to boosting with asymmetric losses. In this paper we take a gradient descent view of boosting in order to motivate a new boosting variant called BiBoost which treats the two classes differently. This variant is likely to perform well when there is a different cost for false p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007